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Abstract 

This paper describes a method for the parallel solution of triangular 
sets of equations, appropriate when there are many right-hand sides. 

By preprocessing, the method can reduce the number of parallel steps 
required to solve Lx = b compared to parallel forward or backsolve. 
Applications are to iterative solvers with triangular preconditioners, to 
structural analysis, or to power systems applications, where there may 
be many right-hand sides (not all available a priori ). 

The inverse of L is represented as a product of sparse triangular 
factors. The problem considered in this paper is to find a factored 
representation of this inverse of L with the smallest number of factors 
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(or partitions)! subject to the requirement that no new nonzero ele- 
ments be created in the formation of these inverse factors. A method 
from an earlier reference is shown to solve this problem. This paper 
improves upon this method by contracting a permutation of the rows 
and columns of X that preserves triangularity and allows for the best 
possible such partition. 

A number of practical examples and algorithmic details are pre- 
sented. The parallelism attainable is illustrated by means of elimina- 
tion trees and cliques trees. 

Keywords: Sparsity! sparse matrices, numerical linear algebra, trian- 
gular matrices, matrix partitioning, parallel computation. 


1 Introduction 

This paper considers the problem of solving a nonsingular lower triangular 
set of sparse equations Lx = b in a parallel environment. (All our results 
extend in a trivial way to upper triangular systems, a fact we shall not 
mention again.) We consider the case where the problem must be solved for 
multiple right hand side vectors &, and these vectors are not available all at 
once. By preprocessing, we reduce the number of parallel steps required. 

Important applications where multiple right-hand-sides arise include fi- 
nite element applications, preconditioned iterative solvers for linear systems, 
solution of initial value problems by implicit methods, and variants of New- 
ton’s method for the solution of nonlinear equations. Often, L is a triangular 
factor computed by LU decomposition of a sparse matrix. In this case, L 
is a perfect elimination matrix (its graph is chordal). Our results do not 
require this. Thus, L may arise from an incomplete factorization or any 
other process. 

There are two possible approaches to the parallel solution of triangular 
systems of equations. The usual approach is to exploit whatever parallelism 
is av aila ble in the usual substitution algorithm [4, 10]. The second, which 
requires preprocessing, works with some representation of I" 1 . In sequential 
sparse matrix computation, substitution is universally favored because it 
retains the sparsity of the problem [7]. If L is dense, its inverse X” 1 is also 
dense. If X is sparse, its inverse is usually much denser than X itself [9]. 
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Here we consider a factorization 

L~ 1 = f[Qk ( 1 ) 

k=l 

with sparse factors. Such a factorization is possible in which the factors have 
no more nonzeros than X [2, 3, 9]. The chief advantage of a factorization 
of X- 1 is that all the necessary multiplications for the computation of QkX 
can be performed concurrently. Thus, it is possible to take advantage of 
more parallelism in the solution of these equations. The necessary additions 
can be done in at most log 2 m operations, where m is the largest number of 
nonzeros in a row of Q*. 

The remainder of this introduction reviews the use of partitioned inverses 
of X. Any triangular matrix X can be expressed as a product of elementary 
matrices: 

X = X 1 X 2 • • *X n _i (2) 

where 


(X fc )ifc = L ik for fc<t<n, 

(Lk)jj m 1 for 
(Lk)ij = 0 otherwise. 

The matrices X* are known as elementary lower triangular matrices. 
They can be grouped into several factors, 


m 

i=n^ 


1 


where 


Ph = +2 * * * L eu 


and 


0 = «o < *i < • • • < e»n — n — 1 . 


(3) 

(4) 

(5) 
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Here {efc}JL 0 is a monotonically increasing integer sequence. The factor Pk is 
lower triangular and is zero below its diagonal in all columns except columns 
Cfc_j + 1 through e*, where it is identical to L. Consider, for example: 


L 


10 0 0 
0 2 0 0 

3 2 3 0 

4 0 11 


By choosing e 0 = 0; ei = 2; e 2 = 3, L can be partitioned as follows: 


L — — PlPj 


10 0 0 
0 2 0 0 

3 2 10 

4 0 0 1 


10 0 0 
0 10 0 
0 0 3 0 
0 0 11 


The solution of the partitioned problem proceeds as follows. From (3) it 
follows that 


* = !-■*= n p i' h - 


( 6 ) 


Thus, one may compute * as follows: 


x = b; 

for k = 1 to m do 
z = P^x; 
od; 


Tn computing the matrix-vector products, we may exploit parallelism 
fully, computing the results in flog 2 max-row(P^' 1 )1 time with rj{ P k) pro- 
cessors. Here t)(X) denotes the number of nonzeroes in X and max-row(X) = 
rnar, (number of nonzeroes in row i of X) = max,(q(efX)). 
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Our viewpoint, which, is the use of a partitioned inverse of i, is computa- 
tionally quite similar to frontal methods for factorization [8] and to methods 
using supemodes [5]. 


2 Problem Definition 

This section defines some terminology and the two problems that are ad- 
dressed in this paper. 

Definition 1 X is invertible in place ifxij ^ 0 & (X -1 )^- ^ 0. 

The elementary lower triangular matrices are invertible in place. There 
therefore always is at least one partition (3) of L with factors that invert in 
place. A partition in which the factors P* are invertible in place is called a 
“no-fill’ 1 partition. The objective here is to find a no-fill partition with the 
smallest possible number of factors. 

Problem 1 Determine m and ei, • ■ •, e m _i such that each Pj,, 1 < k < 
m, is invertible in place and m is minimum. Denote this minimum m by 
min-fact(L). 

Definition 2 A solution to Problem 1 is a minimum no-fill partition of L. 

Problem 2 Determine a permutation matrix II such that in = HiII r re- 
mains lower triangular and min-fact(in) is minimized. 


2.1 Graph Theory Concepts 

Let G(L) be a digraph with vertices V = {1,2 , ...,n} and directed edges 
E = E(L) = {(*, j) | Lij / 0}. Think of the edge (i,j) as an arrow going 
from vertex j to vertex *. If L is lower tri an g ular , then for all edges (t, j) 
we have i > j. G(L) is therefore an acyclic digraph, or DAG. 

Definition 3 The indegree of a vertex i is the number of vertices j such 
that ( i,j ) € E. 
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The indegree of vertex * is the number of nonzeros in the t* 1 * row of L. 

Definition 4 Let G = G(L). For j £ V, madj(j) is the set of higher num- 
bered neighbors of j, i.e. the set {* > j | Lij 0}. 

The set madj(j) is the set of rows that are nonzero in the j* column of L. 

Definition 5 For ( i,j ) £ E we say that j is a predecessor of i and i is a 
successor of j. 

In anticipation of forthcoming theorems, we need to establish two lem- 
mas. 


Lemma 1 If L is a nonsingular lower triangular matrix, then (s', j ) is an 
edge of G{L~ l ) iff there is a path from j to i in G(L). 

Proof: For all 1 < t < n, (s', s') £ E{L)C\E(L~ X ), since both are nonsingular 
and triangular. We induct on s — j. For i > j 


0 = (LL- l )ij 

= E 

p=i 

= (iMj+iai; 1 

The second term above is nonzero iff ^ 0. The first term is nonzero 
iff at least one product L{ p L~j ^0, j < p < i — 1. By the inductive 
hypothesis (since for any such p, p — j < i — j) this holds iff there is an edge 
(s, p) and a path from j to p. Together, these constitute the required j -+ i 
path. CD 

Definition 6 The digraph G = (V, E) is closed if (t‘, k) £ E and ( k,j ) £ 

E^(iJ)£E. 
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Definition 7 Let G = (V,E) be a digraph associated with a triangular ma- 
trix L. Given a subset S ofV, define the column subgraph Gs = (V, Es) 
(where Es = {(*, j) € E\j G S}) as the graph of the lower triangular matrix 
obtained by zeroing all columns of L not in S . 


Theorem 2 Let a partition (5) and corresponding factorization (3) be given. 
The factors Pk are invertible in place iff each column subgraph G(Pk) = 
+i e *} is closed. 

Proof: By Lemma 1 (P^ 1 )ij ^ 0 iff there is a j — * * path in G(Pk )• The 
following are therefore equivalent: 

• Pk is invertible in place 

• (Pfc 1 )«i 5^ ® ^ (Pk)ij 0 

• for every j —* i path, (Pk)ij / 0 

• G(Pfc) is closed. 


□ 

Lemma 3 Let L\ and L^ be lower triangular. Then (i, j) 6 E(L\L 2 ) iff 
there is some k, with j < k < i such that ( i,k ) £ E(L\) and ( k,j ) £ E(L 2 ). 


Proof: For all t > j, (LiLzfij = X^=j(Li)»J«(L 2 )*j. See Figure 1. 


2.2 Partitioning the Inverse 

Consider Problem 1, that of representing L~ x by partitioning its factoriza- 
tion with the smallest possible number of factors that invert in place. The 
following algorithm was proposed by Alvarado, Yu and Betancourt (who call 
it PA2) [3]: 
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Figure 1: Proof of Lemma 3. 

Algorithm Pi: 

Input: L = Li-Lj • • • L n —i 
Output: A partition of L. 


* *— i; *-i; 

while (*' < n — 1) do 

let r be the largest integer greater than i such that L\ • ■ • L r 
is invertible in place; 

Pk *— Li • • ‘L v \ 

k*-k + l; i *- r + 1; 

od 


Alvarado, Yu, and Betancourt did not mention the issue of minima l i ty. 
Here we show that Algorithm PI determines a mini m u m no- fill partition. 

Lemma 4 If L\ ••• L r is invertible in place, then L?- • • L r is, too. 
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Proof: Obvious. 


Theorem 5 Algorithm Pi produces a minimum partition (it solves Problem 
!)■ 


Proof: Suppose Algorithm PI produces a partition L = Pi • • • P m • Clearly 
there does not exist a no-fill partition with e\ any larger than that produced 
by Algorithm PI. 

Now we show by induction on n that there is no better partition. Let 
L = P\ • * • Pm 1 he a different no-fill partition. Suppose Pi = P\. By the 
induction hypothesis, Algorithm PI when applied to P2 • • -Pm» produces a 
minimum no- fill partition, i.e. m — 1 factors is least possible. Thus, m 7 > m. 

On the other hand, perhaps P\ = with e[ < ei, i.e. Pi has 

fewer nonzero columns than does Pi . The matrix Q = P 2 • • • Pm # has a no- 
fill partition with (m' - 1) factors. By the previous lemma, we may remove 
the leftmost elementary lower triangular factor of ($ and still have an m* — 1 
factor no-fill partition. Continuing in this way we find such a partition of 
Q . But by the inductive hypothesis, any no-fill partition of Q has at least 
m - 1 factors. Thus, we again see that m 7 > m. □ 

Minimum partitions are not unique. For example, 


L = 


x 

X x 
XXX 

0 0 x x 


( 7 ) 


has minimum partitions: 


L = (hLlLJLi 


and 


L = (L 1 L 2 )(L 3 L 4). 
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Figure 2: Illustration of the miniinnm partition for a graph without re- 
ordering. Five factors are required. 

2.3 Triangular Permutation 

Consider the graph illustrated in Figure 2 for which X has a minimu m par- 
tition: 


L = (L l )(L 7 )(L s )(L 4 )(Lt)(L<L7) 

This partition has six factors. It is possible to symmetrically permute 
the rows mlnmna of X such that X remains a lower triangular and G(L) 
is as illustrated in Figure 3. A minimum partition of X for this new graph 
is: 


X = (X 1 Xj)(X s X 4 )(X 5 X«X7) 

This permuted partitioned matrix has only three factors. 
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Figure 3: Illustration of the minimum partition for a re-ordered graph. Only 
three factors are required. 


Definition 8 A tri*"gular ordering of an acyclic digraph is a numbering of 
vertices in which ( i,j ) € E implies i > j. 

Every acyclic digraph triangular orderings. The digraph of every 
triangular matrix is acyclic; moreover, its conventional ordering is tria n g ul a r . 
If G(I) is re-ordered so that vertex t is numbered v(t) and the new ordering 
is triangular, then the symmetric permutation of rows and columns that 
moves row i to row i/(i) leaves L triangular. 


2,4 Level of a Vertex 

Let G = {V, E) be a DAG. Define level(t), i 6 V as follows [6]: 

1. If indegree(t) = 0, then level(t) = 0. 

2. Otherwise, level(*) = 1 +level(t'), where * # is the vertex corresponding 
to i in the subgraph of G obtained by deleting all vertices at level 0 
and their outgoing edges. 



Level 0 


2 


Level 1 


Level 2 




Figure 4: Illus tration, of the concept of Level. 


Figure 4 illustrates the concept of leveL In general, level(t) is the length 
of the longest path that ends at i. The computation of levels is straightfor- 
ward, requiring 0(q(L)) time. 


3 Best Partitioning with Re-ordering 


This section describes two new algorithms for solving Problem 2. Each 
of them finds a triangular ordering of a DAG G such that the re-ordered 
graph a minimum no-fill partition with the smallest possible number 
of factors. The first algorithm (RP1) is a fairly straightforward “greedy” 
algorithm. The second algorithm, BP2, is a faster but more complicated 
implementation of the same idea. 
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Algorithm RPl (Re-order, Permute 1): 

Input : A directed, closed acyclic digraph G(L). 

Output: A permutation v : V — ► {1, . . . , n} and a partition of L. 


Compute level(v) for all v € V; 
max-level «— max r€ v(level(v)); 

* < — 1; Jb •« — 1; eo <— 0; 

while t < n do 

Sk «- 0; e fc 4- «; 

t *- min{j | there is an unnumbered vertex at level j}> 
repeat 

for every vertex v at level l do 

if (([Condition 1] r is unnumbered ) && 

([Condition 2] Every predecessor of v has been numbered ) && 
([Condition 3] Every successor of v is a successor of all 
u G Sk such that u is a predecessor of v) ) then 
v(v) <- *; *<-*-(- 1; 

Sk *— Sk U {v}; ek e k + 1; 
fl 
od 

i*-e + v, 

until L > max-level or no vertices at level l — 1 were included in Sk] 

Pk <— k*- k- 1-1; 

od 


The algorithm works by finding a partition V = UJLjSj, for which the 
column subgraphs Gs k are closed. Moreover, Si is a source node in the 
quotient graph, i.e. there are no edges directed into S\. If Si and its out- 
edges are removed, then Sj is a source node, etc. Thus, we are carrying 
out a partitio nin g of G as well as of L. We shall call the subsets Sk m this 
partition factors in analogy with their corresponding factors Pk of L. 

Proposition 1 The factor Si chosen by RPl is the largest possible , i.e., Si 
includes all of level 0; all allowable nodes from level 1; all allowable nodes 
from level 2/ etc. This is also true of SjJ > 1, which is the largest possible 
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Figure 5: Illustration of Condition 3. Successors of predecessors of a node 
must be successors of the node itself. Node 5 can be included in the same 
{actor as nodes 1 and 2, but node 6 cann ot be included with 3 and 4. Factors 
are denoted by shapes and shading. 

first factor of the graph obtained by deleting all verices and incident edges 
Of S\ y • • • , Sj—1 • 

This proposition is self-evident. We illustrate it with the example in 
Figures 5 and 6. These figures illustrate the elimination tree [13] of a matrix, 
and illustrate all successors of each node as well as the next element in the 
tree. Different node shapes are used to denote different factors. In Figure 5 
it is possible to co mbine nodes 1, 2 and 5 into the same factor according 
to Condition 3: all the successors of the predecessors of 5 are successors 
of 5 itself. It is not possible, however, to combine node 6 into the same 
factors as nodes 3 and 4 because this rule is violated. Figure 6 illustrates 
a case where it would be possible to merge node 5 into the same factor as 
nodes 2 and 3, but not all the predecessors of 5 are numbered when node 5 
becomes eligible (node 4 is not yet numbered when node 5 is considered). 
This violates Condition 2. Thus, node 5 must be in a different factor. 

Theorem 6 Procedure RPl solves Problem 2 . 
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Figure 6: Illustration of Condition 2. A node most be excluded from a factor 
because some of its predecessors are not numbered. One predecessor of node 
5 (node 4) h a* not been n umb ered by the time node 5 becomes eligible to 
join the factor containing nodes 2 and 3. 


Proof: By induction on n. Consider any permutation T and partition Lr = 
rxr r = nzu P fc with defined b y ( 4 ) “d (5), and with P k invertible in 
place. We claim that the partition point e x is no larger than that produced 
by RP1. For it follows from the construction of P\ that any vertex of G(L ) 
not included in Si by RP1 either has an unnumbered predecessor, or else 
its inclusion renders Gs x not closed. Hence the subset S\ selected by RP1 
is maximal . Now, by the inductive hypothesis, RP1 partitions G\S\ X using 
TTi — 1 invertible- in-place factors. Moreover, by Lemma 4, a graph obtained 
by adding source vertices to G \ S\ requires at least m — 1 factors in any 
optimal partition. Thus, if we take a subset S\ of S\ as the first factor, we 
can achieve nothing better. D 

We would lik e to insure that running tune is bounded by a small constant 
multiple of r)(L ). But the running time of Algorithm BP1 is large in some 
cases. Consider a dense lower triangular matrix of order n. RP1 takes 0(n 3 ) 
time in this case, since the cost of checking whether all successors of vertex 
j are also successors of its predecessors is 0 ( j{n — j)). 

We now introduce two new data structures in order to improve the effi- 
ciency of EP1. We can improve the performance of BP1 (to 0(r/(X)) for a 
dense matrix) by insuring that a vertex is not examin ed for possible inclu- 
sion in S k until all of its predecessors have been numbered. To do so, for 
every vertex we count the number of its unnumbered predecessors. Initially, 
this is its indegree. When the count reaches zero we can consider the vertex 
for inclusion in a factor. 

Second, we rRTi avoid much of the work associated with the checking at 
Condition 3 of RP1. Let u and v be numbered vertices both of which have 
been included in the current factor S*. Assume that v is a successor of u. 
Then we must have that madj(v) C madj(u), otherwise v would have failed 
the test at Condition 3. Thus, we need not consider vertex u when applying 
the requirements of Condition 3 to a vertex that is also a successor of t>. We 
shall make use of this in the faster implementation (RP2, below) by keeping 
track of the set of predecessors of each vertex that may need to be e xam i n ed 
in checking Condition 3. In the situations above, when v is included in S k 
we remove u from the predecessor sets of v’s successors, thus avoiding the 
unnecessary checking. 
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Algorithm RP2 (Re-order, Permute 2): 
Input : A directed, closed acyclic digraph G{L). 
Output: A permutation and a partition of L. 


forall v € V do 

pred(v) «- {u | L m ± 0}; 
count (u) <— indegree(v); 

Compute level(v); 

od 

max-level *— max* € y(level(v)); 

*♦-1; Jfe+-1; e 0 «- 0; 

E «— {w € V | count(v) = 0}; 
while t < n do 

Sk *- 0 ; e k *— «; 

l «— min{j | there is an unnumbered vertex at level j}; 
repeat 

for every vertex v € E at level l do 

if ( [Condition 3’] Every successor of v is a successor of all 
u € pred(v) ) then 
v(v) 4 — *; * *- i + 1; 

5 fc 4 — S k U {v}; e k i-e k + 1; 
for every successor 10 of v do 
pred(tu) 4— pred(tu) \ pred(t>); 
count (ty) 4— count(w) — 1; 
if count(w) = 0 then E *— E U {tr}; 
od 
fl 
od 

1^1+ 1; 

until l > max-level or no vertices at level l-l were included in S*; 
Pk *- £e 4 _! -“ie*; k*-k + l; 
od 


The computation of pred(v) is straightforward, requiring 0(t}(L)) time. 
Clearly, the innermost loop of Algorithm RP2 is executed tj(L) tunes. For 
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we have that the sum of count(t;) over all vertices v is just T7(X), and this 
sum decreases by one for every execution of this innermost loop. The other 
stat eme nts in the scope of the then clause are executed no more than n 
times. It is still possible that testing Condition 3’ will be costly. Indeed, 
one construct examp les for which the running time is greater than Tj(L) y 
but in practice this is unlikely to happen. 1 


4 Examples 


This section illustrates several examples of the performance of the proposed 
algorithms for matrix partitioning, and compares the behavior of the pro- 
posed algorithm with the performance of previous algorithms. Two tables 
of comparative data are presented. Table 1 uses five power system matrices 
ranging is size from 118 to 1993. Table 2 gives results for matrices arising 
from five-point finite difference discretizations. 

In each case, the original coefficient matrix is first ordered and fills are 
added to make it a perfect elimination matrix. We need to distinguish 
this first fill-reducing ordering of A from the re-ordering of L found by 
BP1 and EP2. We call the ordering of A the primary ordering. Three 
primary ordering procedures are used: the minimum degree algorithm [14], 
the multiple minimum degree (MMD) algorithm [11], and the min i m u m 
level, minimum degree (MLMD) algorithm [6]. 

For each matrix and primary ordering algorithm, two partitioning meth- 
ods are compared: Algorithm PI, which simply partitions L optimally with- 
out re-ordering it, and Algorithm RP1 which re-orders the matrix and gen- 
erates an optimal partition. In most cases, Algorithm RP1 gives a s mall er 
number of factors than PAl, while in a few cases both algorithms give the 
same number of factors. 

The ttumti observation justified by these data is that RPl reduces the 
number of factors at no expense in added fills. Its effect is most dr ama tic 
if the underlying primary ordering is the minimum degree algorithm. On 
the other hand, the best results are obtained when the MLMD algorithm 
is used for the primary ordering, even though the relative improvement 

1 Consider a graph with 3 k vertices arranged m 3 levels of k e ach* All vertices at level 
/ are adjacent to all at level / + 1, for / = 1,2. Running time is 0(iy(X) 1/3 ) again. 
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Figure 7: Five-point finite difference matrix for 10 by 10 grid. Matrix or- 
dered by the TmniTmiTn degree algorithm, then partitioned by Pi* Twenty 
factors result. 

att ainab le by Algorithm RP1 over Algorithm PI is sma l l . The results for 
the MMD algorithm fall somewhere between minimum degree and MLMD. 
The reduction in the n umb er of factors achieved by HP1 in comparison with 
PI is quite dramatic when MMD is the primary ordering. 

Figures 7 through 10 provide a different illustration of the effect of all 
four algorithmic combinations considered. All four figures use a 10 by 10 
finite difference grid. Figure 7 illustrates the original matrix ordered by 
the minimum degree method, fills added, and its partition obtained using 
Algorithm PI. Twenty factors be seen* Figure 8 illustrates the same ma- 
trix after re-ordering it by EP1. Although the matrix has been re-ordered, 
its topology is identical with that of Figure 7, but only twelve factors are 
required. Figure 7 illustrates the original matrix ordered by the MLMD 
algorithm after fills are added (its DAG is chordal). The topology of L is 
different from the one in the previous two figures. It is, in fact, a little 
denser. It can be partitioned with Algorithm PI using only nine factors. 
Finally, if the original matrix is ordered with the MLMD algorithm and 
then re-ordered and partitioned with RP1, the result is a matrix with the 
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Figure 8: Five-point fini te difference matrix for 10 by 10 grid. Matrix or- 
dered by the minimum degree algorithm, then re-ordered and partitioned 
by RPl. Twelve factors result. 
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Figure 9: Five-point finite difference matrix for 10 by 10 grid. Matrix or- 
dered by MLMD, then partitioned by PI. Nine factors result. 
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Figure 10: Five-point finite difference matrix for 10 by 10 grid. Matrix 
ordered by MLMD, then re-ordered and partitioned by RP1. Seven factors 
result. 
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same topology but where only seven factors are required, as illustrated in 
Figure 10. 

Yet another way to look at the same information is to view the elimi- 
nation trees associated with these matrices. Figure 11 illustrates the elimi- 
nation tree associated with the 10 by 10 finite difference matrix ordered by 
MLMD and partitioned by PI. The corresponding matrix for this tree is 
shown in Figure 9. The height of the elimination tree is thirty-two. But 
there is a no-fill partition with only nine factors, which makes it possible to 
aggregate elements from many levels into a single partition. If the matrix 
is re-ordered once more using RP1 and then partitioned, the height of the 
tree and its shape remain identical, as illustrated in Figure 12, but greater 
grouping from among different levels is possible, resulting in only seven fac- 
tors. In these figures, the elements from each partition are distinguished by 
using different shapes as well as shading. 

To illustrate the very different kinds of tree shapes attainable, Figure 13 
illustrates the partitioned factorisation tree associated with the primary 
MLMD ordering of the size 118 power system example, re-ordered and par- 
titioned by RP1. 

The reader may wonder about the relationship of the present work and 
earlier work on exploitation of dense submatrices on sparse elimina tion. This 
work is all related to clique structures in the graph of L . (A clique is vertex 
subset that is completely interconnected. A maximal clique is a clique that 
is not a proper subset of another clique. A simplicial clique of G(L + L T ) is 
a clique all of whose members have exactly the same nonclique neighbors.) 
Pothen and other authors give a number of applications of clique and clique 
tree representations of sparse matrix factors. [12] The partitioning of this 
paper is not the same as the partitioning into simplicial cliques or supemodes 
that has appeared previously. Indeed, all members of a simplicial clique of 
G(L) are included in the same factor by Algorithms RP1 and RP2, but 
the converse is not true. Several simplicial cliques may belong to the same 
factor, as illustrated in Figure 13. The ten nodes denoted by shaded squares 
in this figure r ^ Ti all be included in a single factor, but they do not constitute 
a simplicial clique. Figure 14 illustrates the clique tree corresponding to the 
elimination tree of Figure 13. 
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Figure 11: Elimination tree for finite difference grid matrix with MLMD 
primary ordering and partitioned using PI. The nine partitions are denoted 
by different shapes and shadings of the nodes (a tenth partition involving 
the last node only is trivial). 
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Figure 12: Elimination tree for finite difference grid matrix with MLMD 
primary ordering, re-ordered and partitioned using RP1. Seven factors. 
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Figure 13: Elimination tree for 118 node power system matrix with MLMD 
primary ordering, re-ordered and partitioned using RPl. Five factors. 


Figure 14: Clique tree for 118 node power system matrix with MLMD pri- 
mary ordering, re-ordered and partitioned using RP1. This matrix has 99 
TnAYirrml cliques and seven levels. 


Table 1: Comparison of the number of partitions using five power system 
matrices, three initial ordering algorithms, and simple partitioning (PI) or 
partitioning after reordering with RP1. 



Min. Degree 

MMD 

MLMD 

Size 

■a 


pi 

RP1 

PI 

RP1 | 

118 

53 

14 

10 

10 

6 


352 

132 

21 

13 

12 

8 

8 

707 

213 

26 

23 

18 

11 

10 

1084 

309 

26 

33 

24 

14 

11 

1993 

563 

35 

41 

25 

15 

15 


Table 2: Comparison of the number of partitions using five-point finite dif- 
ference matrices for a square grid, three initial ordering algorithms, and 
simple partitioning (PI) or partitioning after reordering with RP1. 




MMD 

MLMD 

Grid Size 

pi 

■2d 

PI 

RP1 

pi 

in 

5 by 5 

10 

7 

6 

6 

5 

5 

10 by 10 

20 

12 

15 

11 

9 

7 

15 by 15 

20 

12 

16 

14 

11 

8 
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Conclusions 


An algorithm for no-fill partitioning of lower triangular matrices with the 
fewest possible factors has been presented and proven to be optimal. This 
should reduce the number of steps required for the parallel solution of sparse 
triangular matrices when the same matrix is repeatedly used. 

We have done a number of experiments with the triangular factors of 
sparse matrices, A. The structure of these factors is influenced by the or- 
dering of rows and columns of A. (This ordering is chosen to reduce fills 
during the factorization process.) For such triangular factors, the ordering 
used in the factorization strongly influences the benefits attainable by the 
proposed algorithm. If L is constructed with the minimum degree algorithm, 
the minimum no-fill partition of L (without a change of ordering) tends to 
have a large number of factors. This number can be reduced significantly 
with the proposed re-ordering and partitioning algorithm. The same is true 
if the primary ordering is MMD. The results are better in this case than 
those obtained with primary minimum degree orderings. If A is ordered by 
the MLMD algorithm, the number of factors obtained without a change in 
the ordering is considerably smaller. The re-ordered partitioning algorithm 
can reduced further the number of factors, but its effect is not as dramatic 
as in the minimum degree case. Nevertheless, the total number of factors 
when the MLMD ordering of A is used is smaller than that if the minimum 
degree or MMD algorithm is used. 

We also have provided a graphic illustration of the effect of partitioning 
on elimination trees and on clique trees. All the experiments and illustra- 
tions were done with the aid of the first author's Sparse Matrix Manipulation 
System [1]. 
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